Pipeline Settings - ChartsMaze EDL Pipeline

Overview

The EDL pipeline can be configured using three boolean flags at the top of run_full_pipeline.py. These settings control data fetching behavior, optional datasets, and cleanup operations.

Configuration Flags

All configuration flags are located in run_full_pipeline.py at lines 61-71:

# ═══════════════════════════════════════════════════
# Configuration
# ═══════════════════════════════════════════════════

# OHLCV: Auto-detect mode
# True = always fetch (incremental update: ~2-5 min if data exists, ~30 min first time)
# False = skip entirely (ADR, RVOL, ATH, % from ATH fields will be 0)
FETCH_OHLCV = True

# Set to True to also fetch standalone data (Indices, ETFs)
FETCH_OPTIONAL = False

# Auto-delete intermediate files after pipeline succeeds
# Keeps: all_stocks_fundamental_analysis.json.gz + ohlcv_data/
CLEANUP_INTERMEDIATE = True

FETCH_OHLCV

boolean

default:true

Controls whether to fetch historical OHLCV (Open, High, Low, Close, Volume) data for all stocks.

Behavior:

True: Fetches lifetime OHLCV data using smart incremental updates
- First run: ~30 minutes (downloads full history from 1976)
- Subsequent runs: ~2-5 minutes (only fetches new data)
- Enables ADR, RVOL, ATH, and % from ATH calculations
False: Skips OHLCV fetching entirely
- Pipeline runs ~4 minutes faster
- Fields that depend on OHLCV will show 0 or null:
  - 5/14/20/30 Days MA ADR(%)
  - RVOL
  - % from ATH
  - Returns since Earnings(%)

When to disable:

Testing pipeline changes without needing price data
Running quick fundamental-only refreshes
Network bandwidth constraints

Files affected:

Creates/updates: ohlcv_data/{SYMBOL}.csv (one file per stock)
Creates/updates: indices_ohlcv_data/ directory for index data

FETCH_OPTIONAL

boolean

default:false

Enables fetching of standalone datasets not included in the main pipeline output.

Behavior:

True: Runs PHASE 6 scripts to fetch:
- All market indices (all_indices_list.json) - 194 indices
- ETF data (etf_data_response.json) - 361 ETFs
False: Skips PHASE 6 entirely

What gets fetched:

Script	Output File	Records	Description
`fetch_all_indices.py`	`all_indices_list.json`	194	Nifty 50, Bank Nifty, sectoral indices
`fetch_etf_data.py`	`etf_data_response.json`	361	All exchange-traded funds

Note: These files are standalone and not merged into all_stocks_fundamental_analysis.json.gz. They’re used separately by the frontend for index tracking and ETF screening. When to enable:

You need fresh index composition data
Building ETF comparison features
Running a full data refresh for all asset classes

CLEANUP_INTERMEDIATE

boolean

default:true

Auto-deletes intermediate files after successful pipeline completion.

Behavior:

True: Removes all intermediate files and directories after compression
- Keeps only: *.json.gz files + ohlcv_data/ + indices_ohlcv_data/
- Frees ~150-200 MB of disk space
False: Preserves all intermediate files for debugging

Files deleted when enabled:

INTERMEDIATE_FILES = [
    "master_isin_map.json",
    "dhan_data_response.json",
    "fundamental_data.json",
    "advanced_indicator_data.json",
    "all_company_announcements.json",
    "upcoming_corporate_actions.json",
    "history_corporate_actions.json",
    "nse_asm_list.json",
    "nse_gsm_list.json",
    "bulk_block_deals.json",
    "upper_circuit_stocks.json",
    "lower_circuit_stocks.json",
    "incremental_price_bands.json",
    "complete_price_bands.json",
    "nse_equity_list.csv",
    "all_stocks_fundamental_analysis.json",  # Raw JSON (after .gz is made)
]

INTERMEDIATE_DIRS = [
    "company_filings/",
    "market_news/",
]

When to disable:

Debugging pipeline failures
Inspecting intermediate data quality
Running custom analysis on raw outputs
Developing new pipeline stages

Modifying Configuration

Open the pipeline runner

Navigate to the EDL Pipeline directory:

cd "DO NOT DELETE EDL PIPELINE"

Edit run_full_pipeline.py

Open the file in your editor:

nano run_full_pipeline.py
# or
vim run_full_pipeline.py

Update the flags (lines 64-71)

Modify the values according to your needs:

FETCH_OHLCV = True           # Set to False to skip OHLCV
FETCH_OPTIONAL = True        # Set to True to fetch indices & ETFs
CLEANUP_INTERMEDIATE = False # Set to False to keep intermediate files

Save and run the pipeline

python3 run_full_pipeline.py

Common Configuration Scenarios

Quick Fundamental Refresh (No OHLCV)

FETCH_OHLCV = False
FETCH_OPTIONAL = False
CLEANUP_INTERMEDIATE = True

Runtime: ~4 minutes
Use case: Testing, quick fundamental updates

Full Production Refresh

FETCH_OHLCV = True
FETCH_OPTIONAL = True
CLEANUP_INTERMEDIATE = True

Runtime: ~35 minutes (first run), ~8 minutes (incremental)
Use case: Daily automated refresh, complete data update

Development/Debugging Mode

FETCH_OHLCV = True
FETCH_OPTIONAL = False
CLEANUP_INTERMEDIATE = False

Runtime: ~30 minutes (first run), ~6 minutes (incremental)
Use case: Inspecting intermediate outputs, debugging pipeline stages

Impact on Output Fields

When FETCH_OHLCV = False, the following fields in all_stocks_fundamental_analysis.json.gz will be 0 or null:

Field	Default Value (No OHLCV)
`5 Days MA ADR(%)`	`0`
`14 Days MA ADR(%)`	`0`
`20 Days MA ADR(%)`	`0`
`30 Days MA ADR(%)`	`0`
`RVOL`	`0`
`% from ATH`	`0`
`Returns since Earnings(%)`	`0`
`Max Returns since Earnings(%)`	`0`

All other fundamental, technical indicator, and news fields remain unaffected.

​Overview

​Configuration Flags

​FETCH_OHLCV

​FETCH_OPTIONAL

​CLEANUP_INTERMEDIATE

​Modifying Configuration

​Common Configuration Scenarios

​Quick Fundamental Refresh (No OHLCV)

​Full Production Refresh

​Development/Debugging Mode

​Impact on Output Fields

Overview

Configuration Flags

FETCH_OHLCV

FETCH_OPTIONAL

CLEANUP_INTERMEDIATE

Modifying Configuration

Common Configuration Scenarios

Quick Fundamental Refresh (No OHLCV)

Full Production Refresh

Development/Debugging Mode

Impact on Output Fields